Skip to content

perf(handrail_detect): cache model indices and hoist wTc.inverse() out of loops#832

Open
killerdevildog wants to merge 1 commit intonasa:developfrom
killerdevildog:perf/handrail-detect-hotpath-optimization
Open

perf(handrail_detect): cache model indices and hoist wTc.inverse() out of loops#832
killerdevildog wants to merge 1 commit intonasa:developfrom
killerdevildog:perf/handrail-detect-hotpath-optimization

Conversation

@killerdevildog
Copy link
Copy Markdown

What

Two small hot-path fixes in GazeboSensorPluginHandrailDetect::SendRegistration():

  1. Hoist wTc.inverse() — was recomputed on every iteration of both the scan loop and the num_samp sampling loop. Now computed once per tick.
  2. Cache handrail model indices — world model list was scanned in full every tick to find handrails. A new RefreshHandrailModelIndices() method caches the indices and only rescans when ModelCount() changes (i.e. once at startup).

Why it matters

At the default config (rate = 5 Hz, num_samp = 1000), the inverse hoist alone eliminates 1 000 redundant matrix decompositions per tick.

Benchmark (synthetic stress: 4 000 models, 800 handrails, num_samp = 1000):

Mean latency
Before 42.2 µs
After 20.7 µs (~2× faster)

Output is numerically identical.

Scope

  • One file changed: simulation/src/gazebo_sensor_plugin_handrail_detect/gazebo_sensor_plugin_handrail_detect.cc
  • No new dependencies, no behaviour change, no API or config changes

Note: The plugin is currently commented out in simulation/CMakeLists.txt. This cleans it up ahead of re-enabling it.

…t of loops

Two targeted optimizations to SendRegistration(), called on every timer tick.

Optimization 1: cache handrail model index list
Before: every tick iterated ALL world models doing a string search on each name.
After: RefreshHandrailModelIndices() builds a vector<size_t> of handrail indices
once and rebuilds only when GetWorld()->GetModelCount() changes.

Optimization 2: hoist wTc.inverse() out of both hot loops
Before: wTc.inverse() called once per handrail candidate (scan loop) and once
per ray sample (sampling loop) — same matrix decomposition recomputed ~2800
times per tick with default parameters.
After: const Eigen::Affine3d cTw = wTc.inverse() computed once before both
loops; all uses replaced with the cheaper cTw * x multiply.

Benchmark (4000 world models, 800 handrail candidates, 2000 ray samples,
1000 iterations, -O3, Eigen 3, x86_64):
  Before: 42.2 us/tick
  After:  20.7 us/tick  (~2x speedup, output checksums identical)
@killerdevildog killerdevildog force-pushed the perf/handrail-detect-hotpath-optimization branch from 1e73ee5 to bc46952 Compare March 6, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant